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Abstract — Maximum-distance separable (MDS) array codes 
with high rate and an optimal repair property were introduced 
recently. These codes could be applied in distributed storage 
systems, where they minimize the communication and disk access 
required for the recovery of failed nodes. However, the encoding 
and decoding algorithms of the proposed codes use arithmetic 
over finite fields of order greater than 2, which could result in a 
complex implementation. 

In this work, we present a construction of 2-parity MDS array 
codes, that allow for optimal repair of a failed information node 
using XOR operations only. The reduction of the field order is 
achieved by allowing more parity bits to be updated when a 
single information bit is being changed by the user. 

I. Introduction 

MDS array codes are highly applicable in modern data 
storage systems. Array codes are non-binary erasure codes, 
where each symbol is a column of elements in a two dimen- 
sional array, and is stored on a different storage node in the 
system. In traditional erasure codes, the decoder uses all of 
the available codeword symbols for the recovery of erased 
symbols. However, in distributed storage systems, this property 
requires the transmission of an entire array over the network 
for the recovery of failed nodes. And since node failures are 
common, the network load caused by node recovery became 
a major constraint to the application of erasure codes in such 
systems [5|. 

For that reason, a lot of attention has been drawn recently 
to the minimization of the communication required for node 
recovery. The total amount of information communicated in 
the network during recovery is called the repair bandwidth 
|j4|. In this work we focus on the practical case of systematic 
MDS array codes with 2 parity nodes. In this case, when 2 
nodes are erased, the entire information must be transmitted 
in order the repair the erased nodes. However, when only a 
single node is erased, the required repair bandwidth can be 
lower. It was shown in |4| that the repair bandwidth must be 
at least 1/2 of the entire available information in the array. 
Subsequently, several constructions were designed to achieve 
that lower bound |2|, |3|, |6|, |7|. 

Beside the repair bandwidth, another important parameter of 
array codes is the update measure. In systematic array codes, 
the elements of the information nodes are called information 
elements, and those in the parity nodes are called parity ele- 
ments. The update measure is defined as the number of parity 
elements that need to change each time an information element 
is being updated. For MDS array codes, the update measure 
cannot be smaller than the number of parity nodes. For the 
codes in |i2j, |3l, |l6l, Q, the update measure is optimal. 



Another property of these codes is that the elements of the 
nodes belong to a finite field of order at least 3. This property 
can make the codes difficult for hardware implementation. 
However, it was shown in these papers that for MDS codes 
with optimal repair bandwidth and optimal update measure, 
the node elements cannot belong to GF(2). This is the point of 
departure of this work. Instead of designing codes with optimal 
update measure, we focus on the design of codes with node 
elements in GF(2), with the price of a higher update measure. 
This offers a different trade-off, that can find a wide array of 
applications. 

The main contribution of this work is a construction of 
systematic MDS array codes with node elements in GF(2). The 
construction have a similar structure to the ones described in 
lis, |3|, [61, The codes have 2 parity nodes, and a failure 
of any information node can be recovered with the access to 
only 1 /2 of the available information in the array. Note that in 
general, the amount of accessed information in node recovery 
can be different from the repair bandwidth. Specifically, the 
total access can be higher than the total bandwidth, but not 
lower, since there is no reason to communicate more than what 
is accessed. For that reason, our construction have both optimal 
access and optimal repair bandwidth in the case of a single 
information node failure. However, in the case of a parity node 
failure, the entire information array needs to be transmitted for 
the recovery. But this is not a major drawback, since a parity 
node failure does not reduce the availability of the stored data 
to the user, and thus its recovery can be done offline, and does 
not affect the system performance significantly. The update 
measure in our construction is different for different elements. 
For k information nodes, where k is odd, the expected update 
is 1/2 ■ [fc/2j +2, and the worst-case update is \k/2\ +2. 

The rest of the paper is organized as following: In section 
nil we demonstrate the key principles of the construction by 
simple examples. Next, the construction is described formally 
in section|IIIl with an additional example. Lastly, the properties 
of the constructions are proven in section HV] and concluding 
remarks are brought in section |V] 

II. Demonstrating Examples 

A basic principle of the construction can be demonstrated 
in the case of two information nodes, shown in Figure [T] In 
this case, each of the columns contains two elements. The 
information element in row i and column is denoted as fl,- ^. 
As is the case in the rest of the paper, there are two parity 
nodes. The first parity node is called the horizontal node, as its 
elements are encoded as the horizontal parities. The horizontal 
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Figure 1. Decoding a butterfly cycle. 

element in row i is denoted as hj, and its value is the parity 
of the information elements in row /. The summations in the 
table of Figure [T] are taken modulo 2 without mention, as are 
all of the summations of bits in the rest of the paper 

The second parity node is called the butterfly node, and its 
element in row i is denoted as fc,. The reason for the name will 
be clear in the next example. In the figure above the table, the 
horizontal elements correspond to the horizontal lines, and the 
butterfly elements to the diagonal lines. However, as shown in 
the table, the encoding of bi contains also the element flo,0- 
In the figure, this is symbolized by the dark color of flg,!' 
that signifies that the element to its right is also added to the 
corresponding butterfly element. 

Now consider the case that column 1 is erased. In this case 
the column can be decoded using the available elements of 
row only, by setting flg,! = hp + flo,0' ^nd fli i = bp + ap^p. 
Since the decoder accesses only half of the elements in each 
available column, and only half of the available information 
in total, we say that the access ratio is 1 /2. 

Since we claim that the code is MDS, consider the case 
that both information nodes are erased. Notice that if Uq q was 
not included in bi, the code could not recover from the loss 
of the two information nodes. However, the addition of flo,0 
to bi, which corresponds to the dark element Uq i, allows to 
break the cycle, and create a decoding chain. From /zg + bi 
we obtain flj g. In the decoding chain that remains in Figure [T] 
if we eliminate the diagonal flgi/^iO' i^o^ have all the 
segments and the end element, and therefore all the other three 
elements can be decoded. Notice that the addition of Uq q also 
increases the update measure. If the user wants to change the 
value of flo,0' the encoder needs to update the element bi, in 
addition to hp and bp. The code in Figure[T]is also the simplest 
version of the EVENODD code [1]. 

Now consider the case of 3 information nodes. In this case, 
the construction requires that the nodes contain 4 elements, 
where in general, for k information nodes, the number of ele- 
ments is 2^~^. Although the size of the column is exponential 
in the number of columns, this is still practical because the 
usual number of storage nodes is typically between 10 and 20, 
and the element of a column is a single bit. 

The horizontal elements are encoded in the same way as 
before, as the parity of the rows. The butterfly node is now 
encoded with correspondence to its name, where each bj is 
encoded according to the line in the butterfly diagram of 
Figure |2] that starts at element a,- g- Note that we draw the 
butterfly with column on the right side. The element bi 
is encoded as the parity of the elements in this line, and in 
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gure 2. The encoding of tlie butterfly node. 



addition, if there are dark elements in the line, according to 
Figure|2] extra elements are added to bi. For each dark element 
in the line, the element to its right (cyclicly) is also added to 
In the general case of k information nodes, the \kl1\ elements 
to the right of a dark elements are added (for odd k, see details 
in section Ullt . The careful addition of extra elements in the 
butterfly parity, corresponding to the dark elements, is what 
allows the computation to be done in GF(2). In this example, 
^0 = ^0,0 + ^1,1 + ^3,2 + ^0,2- The elements flg^g, fli^i, 03^2 
come form the butterfly line; additionally, since flg g is dark, 
the element to its right (cyclicly), flg 2, is also added. Similarly, 

^2 = ^2,0 + ^3,1 + «1,2 + fl2,2 + 03,0 + 

The dark elements in Figure |2] are those a, ^ for which the 
y-th bit in the binary representation of i over k — \ bits, is 
equal to the (/ — l)-th bit, where the —1-th bit is considered 
as 0. For example, flo,l is dark since the bit 1 of is equal 
to bit of 0. Now consider the case that node 1 is failing 
and needs to be reconstructed. The decoding method for a 
single node failure is simple: recover the dark elements by 
the horizontal parities, and the white elements by the butterfly 
parity. In the example, we set flg j = /ig + flg g + flg 2 and 
fl3 1 = + fl3 g + fl3 2, and the dark elements are recovered. 
For the white elements, we set a\ \ = fog + flg g + 03 2 + flo,2 
and fl24 = ^3 + fl3^g + flg^2 + ^0,1 (where flg j was recovered 
by the horizontal parity). Notice that according to this method, 
the decoder only access rows and 3, and the access ratio is 
1/2. 

Now consider the case that nodes and 1 fail. We can see 
in Figure |2] that there are two decoding cycles, the cycle of 
rows and 1, and that of rows 2 and 3. For this decoding, 
we can ignore the fact the flg g and fl2,g are dark, since the 
added elements are in column 2, which is available. Therefore, 
the top cycle becomes identical to the previous example, and 
can be decoded in the same way. Note that the bottom cycle 
could not be decoded before the top one. That is since the 
dark elements of column 2 imply that flg j and a\ \ are added 
to the butterfly lines of the bottom cycle, and since they are 
unknown, the cycle cannot be decoded. However, after the 
decoding of the top cycle, the bottom cycle can be decoded 
in the same way. In the case of more information columns, 
the order needed for the decoding of the cycles is related to a 
binary reflected Gray code, and is described in the next section. 

III. Code Construction 

For the presentation of the construction we use extra no- 
tation. Let \n\ = {0, — 1}. For integers / and 
i © denotes the bitwise XOR operation between the binary 
representations of i and j, where the result is converted back 



to be an integer. The expression denotes the j-th bit in the 
binary representation of where f(0) is the least significant 
bit. If ; is negative, f(;) is defined to be 0. The construction 
requires that k is odd. If the number of information nodes is 
even, assume that there is an extra node, where the values of all 
its entries are 0. The construction is now described formally. 

Construction 1. For each pair e [1^-^] X [k], define a set 
Bi^j as following. Ifi{j) let Bj^j = {{i,j)}. Else, let 

hi = {(i'f) ■■ i-f ^ L'^/2J (modk)}. 

Next, let (i^j = i © {2' - 1), and for each i e p''^^], define a 
set 

Encoding: For each i E [2*^^^], set 

Single failure decoding: If the failed node is a parity node, 
use the encoding method. If information node j failed, for each 
i E [2*^"^] recover Ui j as following: If i{i — 1) = setaj^j = 
hi + Y^ji^j di^ji. Else, set 

{'V)e%.\{{'j)} 

Double failure decoding: If both failed nodes are parity 
nodes, use the encoding method. If one of them is the butterfly 
node and the other is the information node j, then for each 
iE [2*^^^], set Qj^j = hi + Y^ji^jCii^ji, and then encode the 
butterfly node. 

If the horizontal node fails together with the information 
node j, decode as following: For i = 0, 1, ... , 2*^"^ — 1, find 
i' according to Algorithm^ and set 

Uit j = bi,^ , -\- ^ ttiii ^jii. 

After node j is decoded, encode the horizontal node. 

Finally, if two information nodes failed, denote their indices 
as ;'o'A' •5"'^^ f^^f /l ~ io ^ [^/'^\ (modk). Next, for i = 
0, 1, . . . , 2*^"^ — 1, find z'o, according to Algorithm^ and set, 
sequentially, 

;'e[/c]\Oo,;i} 

{i',r)&Be.^.\,{[h,jo)Mo,k)Mo,h)} 

/6W\0l} 

(iV)eBc,^_,^\{{/ojo),{'-iJi)} 
%,h =K + E «^o,;' (4) 



Algorithm 1 Find 

1: Inputs: i E [2''"^] 

2: Output: i' E [2''-'^] 

3: Z'(fc-l) ^0 

4: for y = A: - 2 to = / do 

6: end for 

7: for /' = to /' = ; - 1 do 

8: i'{f)^i'[f-l) + i{j') 

9: end for 



Algoritlim 2 Find 

1: Inputs: m E {0, l},iE [2''^^] 

2: Output: im E [2*^"^] 

3: imik - 1) ^ 

4: s ^ argmax,/g {0,1 }{/''} 

5: for 7 = A: — 2 to ; = js do 

6: fm(;) ^Zm(7 + l) + iO'-l) 

7: end for 

8: for 7 = to 7 = — 1 do 

9: imii) ^ im{j - 'i-) + Ki) 

10: end for 

11: imih - s) ^ imih + S - 1) + in 

12: if is — ;i„s > 1 then 

13: for 7 = 7i + 1 — 3s to 7o + s — 1 do 

14: im {]) ^ im {] + 2s - I) + {{j + S - 1) 

15: end for 

16: end if 



Example 1. Let the number of information nodes be 4, and 
therefore k = 5 (the next odd number). Now let jo = 0, 71 = 2 
be the failed nodes. Notice that 71 - 70 = 2 ^ [fc/2j = 2, 
as required. Assume that iterations and 1 of the decoding 
were already performed, and now z = 2 = IO2. To find fg 
and i\, we perform Algorithm^ By lines 3 and 4, Zo(4) = 0, 
and s = 1. In lines 5 — 7, we first set 7 = 3, and io(3) = 
+ i{2) =0 + = 0. Next, Zo(2) = + 1 = 1, and the 
loop is finished. Since 70 = 0, we skip the loop in lines 8- 
10. In line 11, we set Zo(l) = 1 + = 1. 7n lines 13 - 15, 
7 = 0, and Zo(0) = Zo(l) + z(0) = 1 + = \, and in 
conclusion, z'g = III2 = 7. It can be verified that Algorithm^ 
always sets z'l = zq © (2^ - 1) © (2^" - 1), and in this case, 
zi = zo © II2 = IOO2 = 4. 

Algorithm^can be interpreted visually by observing Figure 
[3] In each row, consider the dark elements as 'O's, and the white 
elements as 'V s, and ignore columns 70 and ji . The rows z'q and 
z'l are the ones for which the binary number resulting from this 
observation is equal to i. In the current example, both rows 4 
and 7 have a dark element in column 1, and a white element 
in column 3, corresponding to i = IO2. Note that 04 2 and 
are in the same butterfly line, and the same is true for 04 g and 
fly 2. The butterfly line of a4 Q contains the white element 051 
and the dark element fl3 3, which are both available. Since (23 3 
is dark, the lost element a^ 2 d available element 03 1 are also 




Figures. The butterfly construction with 4 information nodes 



included in the same parity. So successful decoding could be 
made only if row 3 was already decoded in a previous iteration. 
But it is easy to find by observation the iteration on which row 3 
is decoded. Since both a^ i and ^3 3 are dark, row 3 was decoded 
in iteration OO2 = which is earlier than i = 2. Applying the 
same argument for the line of ajQ, we could see that all of its 
elements in rows other than 4 and 7 are available. 

To start the decoding, Algorithm^chooses z'q as the row with 
a dark element in column ji, such that ai^^jg could be the first 
decoded element. In this example, fg = 7, and indeed 07 2 is 
a dark element. Therefore, the decoding of rows 4 and 7 could 
follow directly as described by Equations ([T]l-(|4]l. We also note 
that Algorithm\l\have the same visual interpretation. 

IV. Code Properties 

In this section we show that the codes have optimal access, 
that they are MDS, and present their update measure. The 
first Theorem proves that the single failure decoding function 
of Construction [T] accesses only half of the elements in each 
surviving node, and thus Construction [T] is said to be "repair- 
optimal". 

Theorem 1. Optimal Repair: The single failure decoding 
function of Construction\l\decodes any failed information node 
correctly, and it accesses only 1/2 of the elements in each of 
the surviving nodes. 

Proof: Let j be the failed node. First, note that the fraction 
of elements i G [2'^^^] s.t. i{i — 1) = is 1/2, and therefore 
the decoder accesses half of the elements in the horizontal 
node. Next, note that when j is fixed, the function f{i) = i © 
(2^ — 1) is a permutation, and therefore the decoder also access 
only half of the elements in the butterfly node. Finally, we will 
show that for each accessed element Uji^ji in the information 
nodes, — 1) = and thus the decoder only access half 
of the elements in each node, and the repair ratio is 1/2. 

Let Ui j be a decoded element, and ajr ji be an element that is 
accessed in the decoding process. If f(; — 1) = then i' = i 
by the decoding function, and thus i' [j — 1) = i' {])■ Else, note 
that by the encoding process, i' = z © (2^ - 1) 8 (2^" - 1), 
for some ^ j. If /' > /, then /'(/ — 1) = /(/ — 1), and 
/'(;•) = /(;•) + 1. And if /' < ;, then i' {j - 1) = i{j - 1) + 1, 
and = In both cases, 

i\i - 1) + i'ii) = iii - 1) + + 1 = 1 + 1=0, 



and therefore = — 1), and the proof is completed. ■ 
The next Theorem verify the MDS property of the Con- 
struction. 

Theorem 2. MDS: The double failure decoding function of 
Construction\l\decodes the failure of any two nodes correctly. 

Proof: 

In the case that one of the failed nodes is the butterfly 
node, the proof is trivial by the the encoding method. If the 
horizontal node failed together with the information node j, we 
need to show that in each iteration i, all of the elements fl,// 
where £ B^,, . \ were decoded in a previous 

iteration. To prove that, note that by the definition of the 
set, if is in Bj?,, . \ {{i',])}, than there exists / such 

that i" = i' © {2j - 1 j © {2i' - 1) and i"{f) = i"{f - 1). 
So it is enough to show that for each / 7^ j, such that 
= — 1), dfi^j was decoded in a previous iteration. 
We prove this by induction on the iteration i. In the base 
case, i = 0. and according to Algorithm [T] f' = as well. 
Now by the definition of i", 7^ '"0')' ™d the base 

case is proven. 

For the induction step, assume that i" {]') = i" {]' — 1)- By 
the definition of i" , i'{f ) + f (/ - 1) = f"(/) + i" {f - 1) + 
1 = 1. In addition, for any j" ^ f , + - 1) = 

i"if') + - 1). Therefore, according to Algorithm [T] the 
iteration in which a,// ^ needs to be decoded differ from in 
exactly one bit. And since 7^ — 1), the value of 

that bit in i is 1, and therefore i is a later iteration, and a,// y 
was decoded before. So by the induction hypothesis, fl,v/y is 
known, and the induction is proven. So, by the encoding of 
the butterfly elements, column j is decoded successfully, and 
the horizontal node can be encoded afterwards. 

In the case that both failed nodes are information nodes, 
the proof is very similar. First we need to show that all of 
the terms in Equation ^ are known when 0;^ ^^ is being 
decoded. For each j' E [k] \ {jo/jl}, '^iQ,j' is known since it's an 
element of a surviving node. For where G Bg. . \ 

{(h/7o)/ ikjo), {k,h)}, we use induction on i again. 

First, notice that (zo//l) is actually in Bf . . . That is since 
according to Algorithm |2] zq = z'l © {2'o - 1) © (2^ - 1), 
where the difference between z'q and z'l comes form lines 11 — 
16 in the Algorithm. Therefore, ii^^j^ = z'l © (2^o - 1) = 
z'o © {2h - 1) = iig^j^. In addition, by line 11 of Algorithm 
12 ^oOl) = kih - 1), and therefore, B,„,yj = {(zq,;' : /i - 
f ^ [k/^] (mod k)}, which impHes that (zo,;o) G B^. as 
well. The inductive argument follows the same lines as m the 
previous case, and is therefore omitted. 

At this point, we know that all of the terms in Equation [T] are 
known. Now notice that /z/g + E/ 6 [/c]\{;oji} ^'o/ = "'ojo + 
+ ^('V)6Bf,.j,yg\{{njo),(Wo),(Wi)}'''V = 
fljj yjj + Uig^jg + ttig^j^, and therefore, flj^ yg is decoded correctly. 
After dj^^jg is decoded, it can be seen directly by Equation (|2|i 
that fl,j can be decoded correctly as well. As for ciig^jg, it can 
be shown by the same argument that we used for fl/jjg, that 



it could be decoded successfully. And finally, the decoding of 
fl/jj y-j also follows immediately. ■ 
Lastly, we present the update measure of the Construction. 

Theorems. Update: The expected update measure of Con- 
struction^is 1/2 ■ [k/l] +2, and the worst-case is [k/l] +2. 

Proof: For a uniformly-distributed randomly-picked pair 
{i,i)e[2'^~^ X [k], the probability that ;'(/) = /(/-I) is 
1/2. Therefore, in addition to B£..^j, the expected number of 
sets Bi/ j/ that contain is 1/2- [A:/2J +1. In the case 
that the value of a, ^ is changed, each of these sets require the 
update of an element in the butterfly node, in addition to a 
single element in the horizontal node. Therefore, the expected 
number of updated elements is [k/2\ -1/2 + 2. 

In the worst case, consider the update of an element 
UQ j, for e [k]. For each / E [k] \ {]} such that f — j ^ 
[k/2\ (mod k), = - 1), and therefore, (0,;) e B/,^ .,. 

For that reason, [k/2\ +1 elements of the butterfly node need 
to be updated, in addition to a single element in the horizontal 
node, and the total is [A;/2J +2. ■ 

V. Conclusions 

In this paper, we described a construction of repair-optimal 
MDS array codes, whose array elements are bits, and the 
operations are performed over GF(2). Several problems are 
still open in this topic. First, it could be interesting to find 
out whether there exist repair optimal MDS codes with lower 
update measure. Second, a generalization of the construction to 
more parity nodes could be very useful. And finally, it would 
be important to know whether such codes exist whose number 
of rows is polynomial in the number of columns. 
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